Mapping protein binding sites by photoreactive fragment pharmacophores

Fragment screening is a popular strategy of generating viable chemical starting points especially for challenging targets. Although fragments provide a better coverage of chemical space and they have typically higher chance of binding, their weak affinity necessitates highly sensitive biophysical assays. Here, we introduce a screening concept that combines evolutionary optimized fragment pharmacophores with the use of a photoaffinity handle that enables high hit rates by LC-MS-based detection. The sensitivity of our screening protocol was further improved by a target-conjugated photocatalyst. We have designed, synthesized, and screened 100 diazirine-tagged fragments against three benchmark and three therapeutically relevant protein targets of different tractability. Our therapeutic targets included a conventional enzyme, the first bromodomain of BRD4, a protein-protein interaction represented by the oncogenic KRasG12D protein, and the yet unliganded N-terminal domain of the STAT5B transcription factor. We have discovered several fragment hits against all three targets and identified their binding sites via enzymatic digestion, structural studies and modeling. Our results revealed that this protocol outperforms screening traditional fully functionalized and photoaffinity fragments in better exploration of the available binding sites and higher hit rates observed for even difficult targets.

Among the detected PhP fragment hits, CA was mostly preferred by PhP003 (4.8%) that moderately labelled the other targets (>1.1% for each).BRD4-BD1 was mostly preferred by PhP053 (6.7% vs. >2%).In the case of KRas, PhP048 and PhP012 were the most selective compounds (29.4% and 15.8% vs. >3%, respectively).PhP092, PhP001 and PhP088 preferred Lyo (35.6%, 19.2% and 4.8% vs. >1%), while Myo was targeted selectively only by PhP082 (52.9% vs. >0.5%).STAT5B-NTD was most selectively labelled by PhP040, PhP077, PhP065 and PhP097 (75.0%, 20.0%, 6.5% and 5.7% vs. >3%, respectively).Note that a higher labeling efficiency does not necessarily translate to a stronger affinity in the secondary binding assays that are carried out without irradiation.STAT5B-NTD structure, with residue coloring based on whether the photocatalyst can reach that area with Dexter-energy transfer.Green: all possible lysine residues, where the photocatalyst Ir-G2-PEG3-COOH can attach.Grey: the photocatalyst can activate fragments in the area.Yellow: the photocatalyst cannot activate fragments that bind in the area.(All calculations are based on the following information: (i) the photocatalyst linker has a length of approximately 15 Å, and (ii) the Dexter-energy transfer has a range of 10 Å. 2 ) 4. Supplementary Figure 2. MS spectrum of the photocatalyst-labeled STAT5B N-terminal domain.

Supplementary Note 1. Compound characterization
The PhP library was synthesized following the general procedures available in the Methods section, based on our previous work. 1Compounds PhP006, PhP015, PhP051, PhP056, PhP057 and PhP099 were provided, courtesy of GSK.
Structures of the compounds are provided as SMILES strings and figures in Supplementary Data 1, as the "core fragment" (fragment without the amine handle, included in pharmacophore screening during library selection), "amine" (fragment with the amine handle), and "PhP" (fragment with the diazirine-type photoaffinity tag attached to the amine handle). 1 H NMR assignments and further analytical properties are listed here, while 1 H NMR spectra are collated in Supplementary Data 2.

PhP053:
N-((1H-imidazo [4,5-b]pyridin-2-yl)methyl) -3-(3-methyl-3H-diazirin-3yl)propanamide (0.126 mmol, 32.5 mg, 84%) as an off-white solid.Intact protein masses were recorded by LC-MS using an Agilent G6224 time-of-flight (ToF) Accurate Mass Series mass spectrometer, interfaced with an Agilent 1200 series liquid chromatography and sample handling system.The protein sample was injected using an Agilent 1200 series AutoSampler (Model No. G1367B) with a 10 µL injection volume and maintained at a temperature of 10 °C.Chromatography was carried out on an Agilent Bio-HPLC PLRP-S (1000Å, 5 µm × 50 mm × 1.0 mm, PL1312-1502) reverse phase HPLC column at 70 °C.Using an Agilent 1200 series binary pump system (Model No. G1312B) the sample was eluted at 0.5 mL/min using a gradient system from Solvent A (water, 0.2% (v/v) formic acid) to Solvent B (acetonitrile, 0.2% (v/v) formic acid) according to the following conditions: Elution gradient (% B) used for intact protein LC-MS The eluent was injected directly into an Agilent ToF mass spectrometer (Model No. G6224A) using a dual ESI source and scanning between 600-3200 Da with a scan rate of 1.03 s in positive mode.The following MS parameters were used: capillary voltage limit -4200; desolvation temperature -340 °C; drying gas flow -8.0 l/min.Data acquisition was carried out in 2 GHz Extended Dynamic range mode.Spectra were processed using Mass Hunter Qualitative Analysis™ B06.00 (Agilent) software with the Maximum Entropy method employed.The total ion chromatograms (TIC) were extracted (region containing protein) and the summed scans were deconvoluted (using a maximum entropy algorithm) over a m/z range with an expected mass range dependent on the protein.

Screening against STAT5B-NTD
For screening against STAT5B-NTD, we have used a UPLC-MS system that consisted of a Waters ACQUITY UPLC I-Class setup coupled with a Waters ACQUITY UPLC Peptide BEH C18 Column (130Å, 1.7 µm, 2.1 mm X 100 mm), connected to a Waters Xevo G2-XS QToF instrument equipped with a Waters Z-spray ESI source.During the analysis, the column temperature was maintained at a constant 60°C, and a sample volume of 3 µL was injected for each analysis.We utilized a gradient elution method with two eluents: eluent A, which consisted of 0.1% formic acid in water, and eluent B, composed of 0.1% formic acid in acetonitrile.The flow rate was set at 0.6 mL/min, and each measurement run lasted for 5.4 minutes.High-quality LC-MS grade solvents were sourced from Merck (Darmstadt, Germany).The gradient program is included below.Data acquisition was conducted in positive ion mode within the 100-2000 m/z (mass-to-charge ratio) range.The MS parameters were configured as follows: a source temperature of 150°C, a capillary voltage of 3.0 kV, and a desolvation temperature of 550°C.
Nitrogen was employed as the atmospheric pressure ionization gas, supplied by a Genius 3020 Nitrogen generator from Peak Scientific.To control the analytical equipment and process the acquired data, we utilized the Waters Masslynx V4.2 SCN996 software package.Spectrum deconvolution was performed using Maximum Entropy modelling (MaxEnt) with the following parameters: a mass range of 7000-30000 Da, 1.00 Da/channel, a 0.75 Da width at half height with a uniform Gaussian damage model, and iterative refinement to convergence (approximately 40 iterations).After the labelling was completed, 45 μL of the sample and 10 μL 0.2% (w/v) RapiGest SF (Waters, Milford, USA) solution buffered with 50 mM ammonium bicarbonate were mixed (pH=7.8)and 6.8 μL of 45 mM dithio-treitol (DTT) in 100 mM NH4HCO3 were added and kept at 37.5 °C for 30 min.After cooling the sample to room temperature, 8 μL of 100 mM iodoacetamide in 100 mM NH4HCO3 were added and placed in the dark at room temperature for 30 min.The reduced and alkylated protein was then digested by 5,5 μL (1 mg/mL) Trypsin/Lys C mix (the enzyme-to-protein ratio was 1:10) (Promega, Madison, USA).The sample was incubated at 37 °C for overnight.To degrade the surfactant, 6.8 μL of formic acid (500 mM) solution was added to the digested protein sample to obtain the final 40 mM concentration (pH ≈ 2) and was incubated at 37 °C for 45 min.For LC-MS analysis, the acid treated sample was centrifuged for 5 min at 13 000 rpm and the supernatant was pipetted into a microvial.
To get more precise information on the structure, samples were further analyzed by a Triple TOF 5600+ hybrid Quadrupole-TOF LC/MS/MS system (Sciex, MA, USA) equipped with a DuoSpray IonSource coupled with a Shimadzu Prominence LC20 UFLC (Shimadzu, Japan) system consisting of quaternary pump, an autosampler and a thermostated column compartment.
Data acquisition and processing were performed using Analyst TF software version 1.7.1 (AB Sciex Instruments, CA, USA).Chromatographic separation was achieved on the Discovery® BIO Wide Pore C-18-5 (250 mm × 2.1mm, 5 μm, 300 Å) HPLC column.Sample was eluted in gradient elution mode using solvent A (0.1% formic acid in water) and solvent B (0.1% formic acid in ACN).The initial condition was 5% B for 7 min, followed by a linear gradient to 90% B by 48 min, from 55 to 63 min 90% B was retained; and from 63 to 65 min back to initial condition with 5 % eluent B and retained for 10 min.Flow rate was set to 0.2 ml/min.The column temperature was 40 °C and the injection volume was 15 µl.Nitrogen was used as the nebulizer gas (GS1), heater gas (GS2), and curtain gas with the optimum values set at 35, 35 and 35 (arbitrary units), respectively.The source temperature was 350 °C and the spray voltage was set to 5000 V.
Advanced Information Dependent Acquisition (IDA) mode was used on the TripleTOF 5600+ system to obtain MS/MS spectra on the 8 most abundant parent ions present in the TOF survey scan.In IDA LC-MS/MS experiment the mass spectra and tandem mass spectra were recorded in "high-sensitivity" mode with a resolution of ~35,000 full-width half-maximum.
In first period (positive TOF MS mode) the data were acquired in the mass range of m/z=300 to 2500, with 0.1 s accumulation time.Declustering potential value was set to 60 V.The intensity threshold for precursor ion selection in TOF survey scan mode was 1000 cps.In MS2 experiment (Product Ion scan mode): the mass range was m/z=50 to 3000, with an accumulation time of 0.1 sec.
PeakView® V.2.2 software (version 2.2, Sciex) and Biologics Explorer software (version 3.0.3,Sciex) were used to assign and evaluate the peaks in the MSMS spectra.

Sample preparation and data acquisition for STAT5B-NTD
After the labelling was completed, 50 μL of the sample and 10 μL 0.2% (w/v) RapiGest SF (Waters, Milford, USA) solution buffered with 50 mM ammonium bicarbonate were mixed (pH=7.8)and 3.5 μL of 45 mM dithio-treitol (DTT) in 100 mM NH4HCO3 were added and kept at 37.5 °C for 30 min.After cooling the sample to room temperature, 4.5 μL of 100 mM iodoacetamide in 100 mM NH4HCO3 were added and placed in the dark at room temperature for 30 min.The reduced and alkylated protein was then digested by 7 μL (1 mg/mL) trypsin (the enzyme-to-protein ratio was 1:10) (Sigma, St Louis, MO, USA).The sample was incubated at 37 °C for overnight.To degrade the surfactant, 6 μL of formic acid (500 mM) solution was added to the digested protein sample to obtain the final 40 mM concentration (pH ≈ 2) and was incubated at 37 °C for 45 min.For LC-MS analysis, the acid treated sample was centrifuged for 5 min at 13 000 rpm and the supernatant was pipetted into a microvial.
To get more precise information on the structure, samples were further analyzed by a Triple TOF 5600+ hybrid Quadrupole-TOF LC/MS/MS system (Sciex, MA, USA) equipped with a DuoSpray IonSource coupled with a Shimadzu Prominence LC20 UFLC (Shimadzu, Japan) system consisting of quaternary pump, an autosampler and a thermostated column compartment.
Data acquisition and processing were performed using Analyst TF software version 1.7.1 (AB Sciex Instruments, CA, USA).Chromatographic separation was achieved on the Discovery® BIO Wide Pore C-18-5 (250 mm × 2.1mm, 5 μm, 300 Å) HPLC column.Sample was eluted in gradient elution mode using solvent A (0.1% formic acid in water) and solvent B (0.1% formic acid in ACN).The initial condition was 5% B for 7 min, followed by a linear gradient to 90% B by 48 min, from 55 to 63 min 90% B was retained; and from 63 to 65 min back to initial condition with 5 % eluent B and retained for 10 min.Flow rate was set to 0.2 ml/min.The column temperature was 40 °C and the injection volume was 15 µl.Nitrogen was used as the nebulizer gas (GS1), heater gas (GS2), and curtain gas with the optimum values set at 35, 35 and 35 (arbitrary units), respectively.The source temperature was 350 °C and the spray voltage was set to 5000 V.
Advanced Information Dependent Acquisition (IDA) mode was used on the TripleTOF 5600+ system to obtain MS/MS spectra on the 8 most abundant parent ions present in the TOF survey scan.In IDA LC-MS/MS experiment the mass spectra and tandem mass spectra were recorded in "high-sensitivity" mode with a resolution of ~35,000 full-width half-maximum.
In first period (positive TOF MS mode) the data were acquired in the mass range of m/z=300 to 2500, with 0.1 s accumulation time.Declustering potential value was set to 60 V.The intensity threshold for precursor ion selection in TOF survey scan mode was 1000 cps.In MS2 experiment (Product Ion scan mode): the mass range was m/z=50 to 3000, with an accumulation time of 0.1 sec.
PeakView ® V.2.2 software (version 2.2, Sciex) and Biologics Explorer software (version 3.0.3,Sciex) were used to assign and evaluate the peaks in the MSMS spectra.

1. SupplementaryTable 1 .
Number of PhP fragment hits against different protein targets.

Table 2 .
Number of proteins labelled by the proteomics probes of ref.50 most similar to PhP fragment hits.